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ABSTRACT 

A multilevel approach was used to examine 
determinants of growth in grade 8 mathematics achievement in 
Thailand. Data for n 9 mathematics teachers and their 4,030 eighth 
graders (the 14-year-old cohort) were taken from the Second 
International Mathematics Study (1981-82) of the International 
Association for the Assessment of Educational Achievement. The 13 
primary sampling units were the 12 national educational regions of 
Thailand plus the capital, Bangkok. The schools in this sample were 
equally effective in converting pretest into posttest scores; there 
were essentially no variable slopes in this respect. When group and 
individual effects on total variance were examined, group level 
effects contributed 32% of the variance and individual effects 
contributed 68% of the variance in posttest scores, with achievement 
higher for: (1) boys; (2) younger students; (3) children with higher 
educational aspirations; (4) those with higher self-perceptions of 
ability; and (5) those with greater interest in and perceived 
relevance of mathematics. The model developed explained most of the 
between-school, but less of the within-school , variance. It is 
suggested that schools in Thailand are more uniform in their effects 
than previous research in developing countries has suggested. Nine 
tables piosent study data. A 33-item list of references is included. 
(SLD) 



********************************************************************** 

* Reproductions supplied by EDRS are the best that can be made 

* from the original document. 



CO 

10 

00 
CM 

CO 

Q 

Hi 



69 



SCOPE OF INTEREST NOTICE 

The ERIC Facility h Iqned 

thli document for procming 

to: 



In our judgment, this document 
it elso of interest to the Cleer 
inghousee noted to the right. 
Indexing ihould reflect their 
special pointa of view. 




World Bank Discussion Papers 



A Multilevel Model 
of School Effectiveness 
in a Developing 
Country 



Marlaine E. Lockheed 
Nicholas T. Longford 



U.S. DEPARTMENT Of EDUCATION 
Office of b'ducetionel Raaearch end Improvement 
EDUCATIONAL RESOURCES INFORMATION 

s CENTER (ERIC) 

fcTThis document hat been reproduced ft» 
received from the P«'eon or organliehon 
originating it 

D Minor changes have been made to improve 
reproduction quality. 

e Pointa of view or opinions stated in Ihio docu 
men! i do not neceseanly repreeOt off.c.al 
OERI poeition or policy 



"PERMISSION TO REPRODUCE THIS 
MATERIAL IN MICROFICHE ONLY 
HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



\ERIC 



2 



RECENT WORLD BANK DISCUSSION PAPERS 

No. 14. Managing Economic Policy Change; Institutional Dimensions. Geoffrey Lamb 

No. 15. Dairy Development and Milk Cooperatives; The Effects of a Dairy Project in India. George 
Merges and Roger Slade 

No. 16. Macroeconomic Policies and Adjustment in Yugoslavia: Some Counterfactual Simulations. 
Fahrettin Yagci and Steven Kamin 

No. 17. Private Enterprise in Africa: Creating a Better Environment. Keith Marsden and Therese Belot 

No. 18. Rural Water Supply and Sanitation: Time for a Change. Anthony A, Churchill, with the 
assistance of David de Ferranti , Robert Roche, Carolyn Tager, 
Alan A. Walters, and Anthony Yazer 

No. 19. The t Publ i c Revenue and Economic Policy -in African Countries: An Overview of Issues and Policy 
Options . Dennis Anderson " 

No. 22. Demographic Trends in China from 1950 to 1982. Kenneth Hill 

No. 23. Food Import Dependence in Somalia: Magnitude, Causes, and Policy Options. Y. Hossein Farzin 

No. 24. The Relationship of External Debt and Growth: Sudan's Experience, 1975-1984 . Y. Hossein 
Farzin 

No. 25. The Poor and the Poorest: Some Interim Findings. Michael Lipton 

No. 26. Road Transport Taxation in Developing Countries: The Design of User Charges and Taxes for 
Tunisia . David Newbery, Gordon Hughes, William D.O. Paterson, and Esra Bennathan 

No. 27. Trade and Industrial Policies in the Developing Countries of East Asia. Amarendra Bhattacharya 
and Johannes F. Linn 

No. 28. Agricultural Trade Protectionism in Japan: A Survey. Delbert A. Fitchett 

No. 29. Multisector Framework for Analysis of Stabil ization and Structural Adjustment Policies: The 
Case of Morocco . Abel M. Mateus and others 

No. 30. Improving the Quality of Textbooks in China. Barbara W. Searle and Michael Mertaugh with 
Anthony Read and Philip Cohen 

No. 31. Small Farmers in South Asia: Their Characteristics, Productivity, and Efficiency. Inderjit 
Singh 

No. 32. Tenancy in South Asia. Inderjit Singh 

No, 33. Land and Labor in South Asia. Inderjit Singh 

No. 35. Global Trends in Real Exchange Rates. Adrian Hood 

No. 36. Income Distribution and Economic Development in Malawi: Some Historical Perspectives. 
Frederic L. Pryor 

No. 37. Income Distribution and Economic Development in Madagascar: Some Historical Pe rspectives. 
Frederic L« Pryor 

No. 38. Quality Controls of Traded Commodities and Services in Developing Countries. Siiron Rottenberg 
and Bruce Yandle 1 

No. 39. Livestock Production in North Africa and the Middle East: Problems and Perspectives . John C. 
Glenn [Also available in French (39F)J 

No. 40. Nongovernmental Organizations and Local Development. Michael M. Cernea 
LA 1 so available in Spanish (405) J 

No. 41. Patterns of Development: 1950 to 1983. Moises Syrquin and Mollis Chenery 

No. 42. Voluntary Debt-Reduction Operations: Bolivia^ Mexico, and Beyond... Ruben Lamdany 



ERIC 



(Continued on the inside back cover.) 



69 




World Bank Discussion Papers 



A Multilevel Model 
of School Effectiveness 
in a Developing 
Country 



Marlaine E. Lockheed 
Nicholas T. Longford 



The World Bank 
Washington, D.C. 



4 



Copyright ©1989 
The World Bank 
1818 H Street, N.W. 
Washington, D.C. 20433, U.S.A. 

All rights reserved 

Manufactured in the United States of America 
First printing January 1989 

Discussion Papers are not formal publications of the World Bank. They present preliminary and 
unpolished results of country analysis or research that is circulated to encourage discussion and 
comment; citation and the use of such a paper should take account of its provisional character. The 
findings, interpretations, and conclusions expressed in this paper are entirely those of the author(s) and 
should not be attributed in any manner to the World Bank, to its affiliated organizations, or to members 
of its Board of Executive Directors or the countries they represent. Any maps that accompany the text 
have been prepared solely for the convenience of readers; the designations and presentation of material 
in them do not imply the expression of any opinion whatsoever on the part of the World Bank, its 
at %1 iates, or its Board or member countries concerning the legal status of any country, territory, city, or 
arc l or of the authorities thereof or concerning the delimitation of its boundaries or its national 
affiliation. 

Because of the informality and to present thr. results of research with the least possible delay, the 
typescript has not been prepared in accordance with the procedures appropriate to formal printed texts, 
and the World Bank accepts no responsibility for errors. 

The material in this publication is copyrighted. Requests for permission to reproduce portions of it 
should be sent to Director, Publications Department, at the address shown in the copyright notice 
above. The World Bank encourages dissemination of its work anc will normally give permission 
promptly and, when the reproduction is for noncommercial purposes, without asking a fee. Permission 
to photocopy portions for classroom use is not required, though notification of such use having been 
made will be appreciated. 

The complete backlist of publications from the World Bank is shown in the annual Index of Publicatiotis , 
which contains an alphabetical title list and indexes of subjects, authors, and countries and regions; it is 
of value principally to libraries and institutional purchasers. The latest edition is available free of charge 
from Publications Sales Unit, Department F, The World Bank, 1818 H Street, N.W., Washington, D.C. 
20433, U.S.A., or from Publications, The World Bank, 66, avenue d'Icna, 751 16 Paris, France. 

Marlaine E. Lockheed is a senior sociologist in the Education and Employment Division of the 
World Bank's Population & Human Resources Department. Nicholas T. Longford is a visiting 
professor in the Department of Mathematics, University of California ~ Los Angeles and a consultant to 
the Bank. 

Library of Congress Cataloging-in-Publication Data 

Lockheed , Mar 1 a 1 ne E. 

A multilevel model of school effectiveness in a developing country 
/ Marlaine E. Lockheed, Nicholas T. Longford. 

p. cm. — (World Bank discussion papers ; 69) 
Includes bibliographical references. 
ISBN 0-8213-1^17-3 

1. Educational evaluatl on — Deve loping countr i es — Case studies. 
2. Academic ach ievement--Case studies. 3. Middle schools—Thailand- 
-Evaluation. 4. Mathematics—Study and teaching (Elementary >- 
-Thailand—Evaluation. 5. Educat 1 on— Deve 1 op i ng countnes- 
-Mathemat ica 1 models. I. Longford, Nicholas T. t 1955- 
II. Title. III. Series. 
LB2822.75.L63 1989 

379. 1 ' 54— dc20 89-48775 

CIP 



5 



Abstract 



The comparative effectiveness of schools in developing countries, 
particularly the relative efficiency with which alternative inputs and 
management practices enhance student achievement, has become the center of a 
lively debate in the literature. Of particular concern is the appropriate 
analytic method to employ when examining school effects. This paper uses a 
multi- level approach to examine determinants of growth in grade 8 mathematics 
achievement in Thailand. 

Results of the analysis showed that schools in Thailand were equally 
effective in transforming pretest scores into posttest scores, and that 
schools and classrooms contributed 32% of the variance in posttest scores. 
Higher levels of achievement were associated with a higher proportion of 
teachers qualified to teach mathematics, an enricheO curriculum and frequent 
use of textbooks by teachers. Individual characteristics, however, 
contributed 68% of the variance, with achievement higher for boys, younger 
students, and children with higher educational aspirations, less perceived 
parental encouragement, higher self -perceptions of ability, greater interest 
in and perceived relevance of mathematics. The model developed in the paper 
was able to explain most of the between school variance, but significantly 
less of the within school variance. Only one variable slope -- the 
relationship between educational aspirations and achievement was observed. 
The implication of these results is that schools in Thailand are much more 
uniform in their effects than previous research in developing countries would 
have suggested. 
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INTRODUCTION 



There are several central questions behind the research into school 
effectiveness. First, do schools make a difference in how much a student 
learns (that is, does the specific school in which a child is enrolled have a 
particular impact on his or her achievement, independent of family 
background)? Second, if so, what are the characteristics of the school that 
account for this difference? Third, do certain schools affect certain types 
of students differently than others? 

These questions, first raised by Coleman in the 1960s, have been 
reconsidered in the current research on the effectiveness of private schools 
(Coleman, Hoffer and Kilgore 1982) and by a new generation of "effective 
school" researchers (Aitkin and Longford 1986; Goldstein 1986; Raudenbush and 
Bryk 1986; Reynolds 1985; Rutter 1983; Willms 1987). The new researchers have 
investigated the questions through the application of new analytic techniques 
that take into account the hierarchical nature of most data on education: 
children within classrooms, classrooms within schools and schools within 
educational authorities (e.g., districts). 

Although appropriate methods for analyzing hierarchically 
structured data on education havo been available since the early 1970s 
(Dempster, Laird and Rubin 1977; Lindley and Smith 1972), application of these 
methods to educational policy decisions in developing countries has been 
hampered by two important shortcomings: (i) the absence of computationally 
efficient algorithms for multi-level analysis; and (ii) the lack of adequate 
data (sufficient cases at each organizational level). Recently, new 
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computational methods have been developed that address the first problem 
(Goldstein 1984, 1986; Longford 1987; Bryk, Raudenbush, Seltzer and Congdon, 
Jr. 1986), and data sets sufficient for their application have been collected 
in a number of developing countries. 

This paper applies multi -level techniques to longitudinal data 
recently collected by the International Association for the Assessment of 
Educational Achievement (IEA) in Thailand to answer the following questions: 
(i) do Thai middle schools affect student learning differentially? (ii) what 
pare of the variation in student learning is attributable to between school 
characteristics versus between student characteristics? ( i i i ) what 
characteristics of teachers and schools enhance student achievement, 
independent of student background? (iv) what is the comparative effectiveness 
of alternative school inputs? (v) are the effects of schools uniform across 
different students? and (vi) how do estimates obtained from the new, multi- 
level techniques compare with those obtained from ordinary regression methods? 

Background 

The comparative effectiveness of schools in developing countries, 
particularly the relative efficiency with which alternative inputs and 
management practices enhance student achievement, has become the center of a 
lively debate in the literature (see, for example, Fuller 1987; Harbison and 
Hanushe.k 1989; Heyneraan 1986; Lockheed and Hanushek 1988). These issues have 
important implications for how governments and international development 
agencies should allocate their limited resources- -whether they should 
concentrate on certain types of inputs (capital investment or lowering class 
size) or should finance others (instructional materials, teacher or headmaster 
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training or student testing). In the United States and the United Kingdom, 
the debate was sparked by studies that claimed to identify effective schools: 
those that enhanced student achievement more than other schools working with 
similar students and material inputs (see Raudenbush 1987 for a recent 
review) . 



more limited, and studies examining the effects of alternative inputs on 
student achievement have not taken into account the explicitly hierarchical 
nature of the explanatory models and data. Instead, most research on 
effective schools in developing countries has utilized a "production function 4 ' 
approach that compares the relative effectiveness of alternative material and 
non-material inputs and, to a lesser degree, teaching processes on student 
achievement. The school characteristics most frequently examined have been 
indicators of material inputs: per pupil expenditures, number of books, 
presence of a library, presence of desks, teacher salaries and so forth. ^ 
The past decade has provided several important reviews of this research 
(Avalos and Haddad 1981; Fuller 1987; Heyneman and Loxley 1983; Husen, Saha 
and Noonan 1978; Schiefelbein and Simmons 1981; Simmons and Alexander, 1978). 
Most of the reviews conclude that, when student background is controlled for, 
school characteristics do have significant effects on achievement, and, in 
many cases, the effects of school characteristics are greater than the effects 
of family background. 



If The most extensive research using this type of model is reported in a 
recent longitudinal study (Harbison and Hanushek 1989) of the effects of 
material inputs on student achievement in rural Brazil. 



In developing countries, research on school effectiveness has been 
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Heyneraan and Loxley (1983), for example, found that the variance in 
student achievement explained by three family background variables averaged 
8.6% across 17 developing countries, while the variance explained by school 
characteristics amounted to 16%, nearly twice as great. Yet, overall, the 
amount of variance in student achievement explained by variables related to 
family background and school inputs in developing countries remains remarkably 
low in comparison with the results of similar studies conducted in developed 
countries. Heyneman (1986) has argued strongly that the failure of 
conventional models to explain the variance in achievement is a consequence of 
poorly conducted research. An equally strong case can be made regarding the 
inadequacy "of the models and indicators employed. 

The more recent, research on school effectiveness differs from 
earlier approaches in four important ways. First, education production 
function research has moved away from answering the questions of whether and 
how much specific material and non-material inputs affect student achievement 
to exploring other questions, including the effects of alternative inputs on 
achievement (e.g., Harbison and Hanushek 1989) and the mechanisms whereby 
material and non-material inputs affect achievement (Lockheed, Vail and Fuller 
1987). Second, better and more culturally relevant indicators of students' 
social background in developing countries have been utilized (e.g., Lockheed, 
Fuller and Nyirongo 1987). Third, complex organizational models of student 
achievement (e.g., Rosenholtz 1989) have begun to replace education production 
function models. Fourth, research has begun to center on the classroom and 
classroom processes as important determinants of learning, with specific focus 
on the role of teachers and administrators as managers of student learning 
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(e.g., Lockheed and Koruenan 1989; Lockheed, Fonacier and Bianchi 1989). This 
paper addresses all four issues. 



Methodological Considerations 

While matters of substantive concern continue to drive the research 
on effective schools, the "effective schools" issue has been fueled by 
controversy over statistical methodology, interpretation and data (for 
example, Sirotnik and Burstein 1985). The most important statistical issue is 
the use of appropriate methods to analyze multi-level data. The argument 
concerns how behavior at one level (e.g., classroom, school or district) 
influences behavior at a different level (e.g., students) and how to estimate 
these multi-level effects correctly.^ 

Hierarchically structured data are common in social research, 
because social institutions are typically hierarchically organized. However, 
the commonly used statistical techniques for dealing with related data may 
lead to biased estimates.^ In particular, it has been established that, when 
observations within clusters on any stratum are more homogeneous than those 
between clusters, the use of ordinary regression methods (e.g., OLS) with such 
data can lead to biased estimates of regression coefficients in unbalanced 
designs and even to substantially biased standard errors for these estimates 
in balanced designs. In that most policy research entails the use of 



** These hierarchical structures result from design elements 
(stratified sampling), data collection technicalities (e.g., interviewer 
effect) or intrinsic interest in cross-level effects (e.g., the effects of 
post-natal feeding programs on the relationship between birth weight and 
subsequent cognitive development) . 

y An extended discussion of this issue is provided by Goldstein (1987). 
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unbalanced designs, a serious problem may arise when ordinary least squares 
regression estimates are used to quantify effects. 

Proper analysis of multi-level data requires two distinct changes in 
thinking about the data. First, the researcher must confront the demands of 
the inherently hierarchical data common to education at the stage of sample 
design, so that sufficient numbers of units at each level are sampled (e.g., 
adequate samples of schools and classrooms, in addition to the sample of 
students). Second, and more important, hierarchical analysis allows a major 
shift in how the effects of organizations on individuals may be viewed: 
instead of considering only the effects of organizational characteristics on 
organizational means, the effects on relationships are also modelled. For 
example, certain school or classroom interventions may affect not only average 
student achievement, but they may also lessen the degree of association 
between family background and student achievement. Here an organization-level 
force serves to mediate an individual -level effect. 

Until recently, most discussions of multi-level analysis have 
remained theoretical, bounded by the costs and computational requirements of 
existing analytic tools. However, the recent development of new analytic 
tools for analyzing multi- level data has energized the debate (Aitkin and 
Longford 1986; Goldstein 1936; Mason, Wong and Enr.wisle 1984; and Raudenbush 
and Bryk 1986). The development of the general EM algorithm (Dempster, Laird 
and Rubin 1977) provided a theoretically satisfactory and computationally 
manageable approach to estimation of covariance components in hierarchical 
linear models . 
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To date, application of these methods in education policy research 
has been limited to a relatively few studies of schools in developed 
countries. To the best of the authors' knowledge, the present study is the 
first such application to data from developing countries. 



CHAPTER I: THE DATA 

Context 

The data used in this study come from the IEA Second International 
Mathematics Study (SIMS) in Thailand, 1981-82, and address eighth grade 
mathematics achievement. The structure of Thailand's education system 
includes six primary school grades, three lower secondary school grades, thr 
upper secondary school grades and tertiary education. While the first six 
years of schooling are compulsory, secondary education is not. At the time 
the data were collected, 33% of the 14-year-old age cohort were enrolled in 
grade eight. 



Sample 

The IEA SIMS sample consisted of 99 mathematics teachers and their 
4,030 eighth-grade students. It was derived from a two-stage, stratified 
random sample of classrooms. The 13 primary sampling units were the 12 
national educational regions of Thailand plus the capital, Bangkok. Within 
each region, a random sample of lower secondary schools was selected. At the 
second stage, a random sample of one class per school was selected from a list 
of all eighth-grade mathematics classes within the school; only students 
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enrolled in school for the entire school year were included. The result was a 
1% sample of eighth-grade mathematics classrooms within each region. This 
design does not distinguish between the school and classroom levels, so that 
only inferences about the aggregate of these effects are possible. 

Method 

At both the beginning and end of the school year, students were 
administered a mathematics test covering five content areas of the curriculum 
(arithmetic , algebra, geometry, statistics and measurement) , Students also 
completed a short background questionnaire at the pretest and a longer one at 
the posttest administration. Teachers completed several instruments at 
the posttest, including a questionnaire on their background and one on general 
classroom processes. They also provided information about teaching 
practices and characteristics of their randomly selected "target" class. 
A school administrator provided data about the school. 

Measures 

The measures included indicators of student attitude and 
achievement, of student social class background, of material and non-material 
inputs at the school and classroom levels, and of classroom organization and 
teaching practices. The following sections provide a description of each of 
the variables analyzed in this paper (see Lockheed, Vail and Fuller 1987 for 
an extended discussion); acronyms for the variables are given in parentheses. 
For easier orientation, the acronyms for pupil-level variables are given in 
capital letters and for group-level (region/school/classroom) variables in 
underlined lower-case letters. This distinction will be clear from Tables 1 
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and 2, which provide the definitions and summary statistics for all the 
variables in the original data set and the data set developed as part of this 
paper. 

Mathematics achievement . The IEA developed five mathematics tests 
for use in/SIMS. One of the tests was a 40- item instrument called the 
core test. The remaining 4 tests were 35-itera instruments called rotated 
forms, designated A through D. The 5 test instruments contained roughly equal 
proportions of items from each of the 5 areas of curriculum content, except 
that the core test contained no statistics items. For purposes of this 
analysis, we regard the instruments as parallel forms with respect to 
math' maticff* content . 

The IEA longitudinal design called for students to be administered 
both the core test and one rotated form chosen at random at both the pretest 
and posttest. In Thailand, students were pretested using the core test and 
one rotated form. At the posttest, they again took the core test and one 
rotated form that was different from the rotated form taken at the pretest. 
Approximately equal numbers of students took each of the rotated forms test in 
both test administrations. 

One goal of this analysis was to predict posttest achievement as a 
function of pretest performance and other determinants. Since students took 
the core test during the pretest, their posttest scores would reflect, to some 
degree, familiarity with the test items. For purposes of our study, instead 
of using the core test, we analyze the scores obtained from the rotated forms, 
after equating them to adjust for the differences in test length and 
difficulty. In this analysis, we use equated rotated form formula scores for 
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both the pretest (XROT) and posttest (YROT) measures of student achievement in 
mathematics ,^ 



Table 1: Sample Charac teristic$ and Variable Names, Descriptions and Means (Proportions) 

of Student-Level Variables for Three Data Sets 



Variable 
Name 



Means /Proportions 



Description 



Data 
Set 1 



Data 
Set 2 



Data 
Set 3 



Sample 

Students 
Classrooms 



2,076 
60 



2,804 
80 



3,025 
86 



Student -Level Va r iab ly 



XROT 


Pretest "mathematics achievement score 


9 


.15 


8.83 


8 


.83 


XSEX 


Student gender (0 " female; 1 - male) 




.53 


.53 




.53 


XAGE 


Age in months 


170 


.94 


171.05 


171 


.09 


YF0CCI 


Father's occupational status: 














Unskilled or" semi-skilled worker 




.15 


.15 




. 15 




Skilled worker 




.44 


.45 




.46 




Clerical or sales worker 




.26 


.26 




.25 




Professional or managerial worker 




. 15 


.15 




. 14 


YMEDUC 


Mother's educational attainment 














Very little or no schooling 




.26 


.26 




.26 




Primary school 




. 58 


.58 




.58 




Secondary school 




.09 


.09 




.09 




College, university or some form of tertiary ed. 




.07 


.07 




.06 


YHLANG 


Use of language of instruction at home (0 - no, 1 - yes) 




.49 








YHCALC 


Calculator at home (0 - no, 1 - yes) 




. 31 








YM0REED 


Educational expectations 














Less than two years 




.08 


.08 




.08 




Two to four years 




.30 


.31 




. iO 




Five to seven years 




.41 


.41 




.41 




Eight or more years 




.22 


.20 




.21 


YPARENC 


Parental encouragement (1 - high) 


2. 


.12 


2.10 


2 


.09 


YPERCEV 


Perceived mathematics ability (1 - high) 


4. 


,05 


4.05 


4 


.05 


YFUTURE 


Perceived future importance of mathematics (1 - low) 


2. 


.06 


2.05 


2 


.06 


YDESIRE 


Motivation to succeed in matnematics (1 - low) 


5, 


,47 


5.47 


5. 


.47 



J For more detail on the construction of the achievement measures, see 
Lockheed, Vail and Fuller (1986). 
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Ta ble 2 : Sample Characteristics and Names, Descriptions and Means (Proportions) 

of Group-Level Variables for Three Data Sets 



Variable 
Name 



Description 



Means/Proportions 



Data 
Set 1 



Data 
Set 2 



Data 
Set 3 



Students 
Classrooms 



2,076 
60 



2,804 
80 



3,025 
86 



Group-lev el Variables 

senrolt Number of students in school ('000) 

sdaysvr Days in school year 

sputear Pupil/teacher ratio in school 

squalmt % of teachers in school qualified to teach math. 

spci81 District per capita income (in 1000 bahts) 

sstream Ability groupings for instruction 

(0 - no; 1 - yes) 

tsex Teacher gender (0 - female, 1 - male) 

tage Teacher age in years 

texptch Years of teaching experience 

tedmath Semesters of post -secondary mathematics 

tnstuds Number of students in target class 

tmthsub Math curriculum (0 - remedial or normal, 1 - enr 

txtbk Frequency of use of textbook (0 - no; 1 - yes) 

cefeed Frequency of individual feedback 

tadminl Minutes spent weekly on routine administration 

torderl Minutes spent weekly maintaining class order 

tseatl Minutes students spent weekly at seat or 
blackboard 

tvlsmat Use of commercial visual materials (0 - no; 1 - 

tworkbk Use of published workbooks (0 - no; 1 - yes) 



yes) 



1.27 


1. 


44 


1. 


41 


195.04 










14.86 


15. 


81 


15. 


93 


.57 




62 




62 


12.94 


12. 


97 






.46 
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.33 




,37 






29.04 










7.25 










3.95 










43.61 


42 


.61 
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.55 




.56 




.58 


2.15 










26.84 










19.40 


20 


.27 


20 


.33 


53.76 


54 


.57 






.34 




.40 






.85 




.83 
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Student background characteristics . The basic background 
information about each student included his or her gender (XSEX) , age in 
months (XAGE) , paternal occupational status (YFOCCI), highest maternal 
education (YMEDUC), home language (YKLANG) and home use of a four- function 
calculator (YHCALC). Paternal occupation (YFOCCI) was classified into four 
categories: (i) unskilled or serai-skilled worker, (ii) skilled worker, 

(iii) clerical or sales worker, and (iv) professional or managerial worker. 
Maternal education (YMEDUC) was classified into four categories; (i) very 
little or no schooling, (ii) primary school, (iii) secondary school, and 

(iv) college, university or some form of tertiary education. 

Student attitudes and perceptions . Five indices of student 
attitudes and perceptions were included. Student educational 
expectations (YMOREED) were measured by a single item that asked about the 
number of years of full-time education the student expected to complete 
after the current academic year. The following categories were defined: 
(i) less than two years, (ii) two to four years, (iii) five to seven years, 
and (iv) eight or more years. Parental encouragement (YPARENC) was measured by 
a four- item index composed of responses on a Likert-type scale in which 
students described their parents 1 interest in, and encouragement for, 
mathematics achievement. For example, for the item "My parents encourage 
rae to learn as rauch mathematics as possible," the response alternatives ranged 
from "exactly like" the student's parents (- 1) to "Not at all like" the 
student's parents (- 5). The four items comprised a single factor, with 
principal component factor loadings ranging from .72 to .83 and communality 
of 2.43. A low score represented greater parental support. Perceived 
mathematics ability (YPERCEV) , perceived usefulness of mathematics 
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(YFUTURE) and motivation toward mathematics achievement (YDESIRE) were all 
developed from a factor analysis of the student attitude survey, which 
contained Likert-type items having response alternatives ranging from 
"strongly disagree" (- 1) to "strongly agree" (- 5). The factors were 
initially identified through varimax factor analyses and then confirmed 
through principal component analyses, from which the factor scores were 
constructed. For YPERCEV, a low value represented a positive attitude; for 
YFUTURE and YDESIRE, a high value represented a positive attitude. 

School characteristics , This study looks at data on six school 
characteristics. Five are conventional indicators of material and non- 
material inputs: (i) school size in terms of the total number of students 
enrolled ( senrolt ) , an indicator of potential resources; (ii) length of the 
school year in days ( sdaysyr ) . an indicator of the time available for 
instruction; (iii) student/teacher ratio in the school (sputear) , an indicator 
of the availability of tea -her resources for the student; ( iv) percentage of 
the teaching staff qualified to teach mathematics ( scmalmt ) , an indicator of 
the quality of teacher resources; and (v) per capita income in 1981 at the 
district level (sr ci81 ) . another indicator of resources. One measure of 
school organization is included: (vi) presence of ability grouping (sstream) . 

Teacher characteristics . Four teacher characteristics are analyzed; 
(i) gender ( tsex ) : (ii) 'ige ( tage ) : (iii) teaching experience (texptch) ; and 
(iv) number of semesters of post-secondary mathematics education ( tedmath) . 
The latter two variables are conventional indicators of teacher quality, 

Classroom characteristics . Three characteristics of the classroom 
are analyzed: (i) class size ( tnstuds) , an indicator of the teacher resources 
available to the student in his/her mathematics class; (ii) remedial or 
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typical versus enriched mathematics subject matter ( tmthsub) , an indicator of 
the quality of the curriculum for the student in a particular class; and 

(iii) whether or not the teacher used textbooks frequently in the class 

( txtbk ) . an indicator of the availability of instructional materials in the 
classroom. 

Teaching practices . Six variables referring to teaching practices 
are considered: (i) providing feedback to students ( cef eed ) , a composite index 
of five elements of teaching practice: commenting on student work, reviewing 
tests, correcting false statements, praising correct statements and giving 
individual feedback; (ii) number of minutes per week the teacher .spent on 
routine administration ( tadminl) ; (iii) maintaining class order ( torderi) ! 

(iv) monitoring assigned seatwork ( tseatl) ; (v) using commercially produced 
visual materials (tvismat) ; and (vi) using workbooks (tworkbk) . All 
information on variables related to teaching practices were self-reported. 

In summary, the data set contains information on 32 variables about 
4,030 pupils from 99 schools. Of the 32 variables, 13 involve student 
characteristics, 5 refer to the school, 4 to the teacher, 9 relate to the 
classroom, and 1 is a characteristic of the district (catchment area) . The 
distinction between the variables related to pupils and to 
classrooms/teachers/schools (henceforth called groups, since they are 
confounded in the design) is important because they play different roles in 
explaining variations in achievement.^ 



& It should be noted that the complete data set consists of 13*4,030 + 19*99 
- 54,271 units of data, although conventionally it would be conceived, and 
stored on a computer, as a data set of 32*4,030 - 128,960 units of data. 
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The data contain relatively more information about the groups (19 
variables for 99 units) than about the pupils (13 variables for 4,030 units). 
Arguably, the group- level variables are also more reliable because they refer 
to school or teacher records and are responses from adult professionals, 
whereas the responses of pupils are subject to test -performance variation, 
recall of family circumstances and arrangements, varying interpretations of 
the questionnaire items and so on. Moreover, the pupil-level variables, e.g., 
XBOT, have a large-group level component of variation; groups vary a great 
doal in their composition (means, standard deviations, etc.) of these 
variables. Hence, not only the 19 group -level variables, but also, to some 
extent, the" 13 pupil-level variables potentially explain group-level variation 
among the 99 groups, whereas only the 13 pupil- level variables explain some of 
the pupil-level variation in the outcome scores of the 4,030 pupils. 



CHAPTER II: MODELS 

Variance Component Models 

The hierarchical structure of the data, with pupils nested within 
groups, requires a form of regression analysis that takes into account the two 
separate sources of variation in achievement. Separation of the variation 
attributable to pupils and to schools/classrooms is also of substantive 
interest, because the latter is a measure of the size of unexplained 
differences among schools/classrooms . 

Goldstein (1986), Raudenbush and Bryk (1986) and Aitkin and Longford 
(1986) have established the relevance of variance component methods for 
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analyzing data with hierarchies. They address the previously mentioned 
problems with the use of ordinary regression methods when the assumption of 
independence of the observations is not satisfied. 

Analytical Framework 

Educational surveys involve hierarchically structured data— pupils 
within classrooms within schools within administrative units or regions. 
Every classroom (school, region) has its own idiosyncratic features that 
result from a complex of influences, including composition, teaching practices 
and management decisions. As a consequence, observations on students (e.g., 
their outcomes) are not statistically independent, not even after taking into 
account the available explanatory variables. This condition violates the 
assumption of independence for ordinary regression (OLS). 

By comparison, variance component models are an extension of 
ordinary regression models that allow more flexible modelling of variation: 
within school or classroom and between schools or classrooms. Pupils are 
associated with (unexplained) variation, but this variation has a consistent 
w.{. thin-classroom component that itself has a within- school component, etc. 
Schools vary, classrooms within schools vary and pupils within classrooms 
vary. Consider the regression model for data with two levels of hierarchy 
(pupils i within classrooms j): 

y tJ - a + /Jx-. + 7Zij + £ .. (1) 

where a, 0 and 7 are (unknown) regression parameters, x and z are explanatory 
variables, y is the outcome measure and the random term e is assumed to be a 
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random sample from a normal distribution with a mean of zero and an unknown 
variance a 2 . Variation among the classrooms can be accommodated in the 
"simple" variance component model: 

yiJ - a + /bc i:j + 7 Zij + aj + «ij (2) 

where the a's form a random sample from a normal distribution with a mean of 

zero and an unknown variance r 2 , and the a's and the e's are mutually 

2 

independent. The covariance of two pupils within a classroom is r 
(correlation r 2 /[r 2 + a 2 ]). If we knew the a's, we could use them to rank 
the classrooms. Model (2) has the form of analysis of variance (ANOVA) with 
distributional assumptions imposed on the a's. The advantages of this 
assumption are discussed by Dempster, Rubin and Tsutakawa (1981), who use the 
term "borrowing strength" in estimating the effects of small groups, and by 

Aitkin and Longford (1986). 

In this model, each school has a uniform effect on the pupils within 
it. As this assumption may be unrealistic, a more flexible model is needed 
that allows not only the school means but also the school regression 
coefficients to vary, as some schools may be more "suitable" for pupils with 
certain backgrounds than others. This corresponds to variation in the 
within-school regressions of y on x and z. This situation can be suitably 
modelled as 



Yij - a + ^ x ij + T z ij + a j + b j X ij + C j Z ij + € iJ 



(3) 



or 

(4) 



y.j - a + fa^ + 7Zij + aj + bjX i:j + e t j. 
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The classroom-level random effects (aj , b j ) are assumed to be a 
random sample from a normal distribution with a mean of zero and an unknown 
variance 2^) Here 2 (2) i nvo i ves three parameters; the variances of a and b 
and their covariance. Extensions to larger numbers of explanatory variables 
and to more complex hierarchies are described in the literature (e.g., 
Goldstein 1987; Longford 1987; Raudenbush and Bryk 1986). 

The maximum likelihood estimation procedures for such models used 
in this paper are based on the Fisher scoring algorithm (Longford 1987) 
implemented in the software VARCL (Longford 1986). It provides estimates of 
regress ion"parameters and (co-) variances, together with standard errors for 
them, and the value of the log-likelihood. 

Variance Component Models Compared with OLS 

Variance component methods involve the explicit modelling of student 
and group variation and afford flexibility in modelling the group variation, 
something that ordinary regression \ot do. The specification of a variance 
component model is necessarily more complex than is the case with ordinary 
regression. In standard situations, the analyst first declares the list of 
the regression variables involved in explaining the outcome for a typical 
group. Next the analyst declares a sublist of this list that contains the 
variables for which the within-group relationships are hypothesized to vary 
from group to group. The full list of variables, referred to as the "fixed 
part,' 1 is analogous to the list of the explanatory variables in ordinary 
regression. The sublist (random part) may contain only pupil-level variables, 
that is, variables that take on different values for students attending the 
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same class. Variables ueasured at the classroom level whose values are 
constant for all students in a classroom cannot be specified in the random 
part of the model, because within-group regression coefficients on group-level 
variables cannot be identified. 

Variance component models involve two kinds of parameters . The 
fixed effects parameters refer to the regression relationship for the average 
group. Their interpretation is analogous to the regression parameters in 
ordinary regression. The random effects parameters are variances and 

■ 

covari.ances that describe the between-group variation in the regression 
relationship. Of prime interest are the sizes of the variances. Zero 
variance of a regression coefficient corresponds to a constant relationship 
across the groups. To obtain information about the variation, we require, in 
general, a substantially larger number of pupils and groups than we do for the 
regression parameters. We can therefore expect to find that a small random 
part, containing only a few variables, provides a sufficient description of 
the variation, whereas the fixed part may contain most of the available 
explanatory variables. 

One important aspect of the separation of the two sources of 
variation is the ability to distinguish between pupil- and group-level 
variation. This aspect comes out very clearly in the following examples: it 
turns out that we have abundant group-level information, i.e., a good 
description of the between-group variation, but a much larger proportion of 
the student-level variation remains unexplained. 

To fix ideas, we consider first a specific model: 



Yij " S k x ij-k ^k + d j + £ ij 



(5) 
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where the indices i - 1, . nj , j - 1 N 2 and k - 0, 1 K, 

represent the pupils, groups and variables, respectively. The /3' s are the 
regression parameters, and the d's and e's are the group- and pupil-level 
random effects, assumed to be independent random samples from 
the normal distribution with zero means and variances a 2 and t We will 
assume throughout that p Q is the intercept, i.e., x^ Q - 1. Analogously with 
the ordinary regression, we can define the R 2 as the proportion of variation, 
explained as 

ft* - 1 - (a 2 + r 2 )/(a 2 raw + r 2 raw ), (6) 

where the subscript "raw" refers to the variance estimates in the "empty" 
variance component model: 

Y tJ - /i + dj + e ir (7) 

It is advantageous, however, to define two separate R 2 s that refer 
to the two levels of the hierarchy for pupils and groups, respectively: 



Rp 2 - (1 - a 2 )/a 2 raw (8) 



R g 2 - (1 - r 2 )/r 2 raw . (9) 
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CHAPTER III: SCHOOL EFFECTS ON MATH EMATICS LEARNING 



Two questions that educators frequently ask are how much student 
achievement increases over the course of a year and whether schools affect 
growth in achievement differentially. In this section, we use the pretest 
(XROT) and student posttest (YROT) to address these questions. We also 
demonstrate, using simple examples from the data, the differences between 
ordinary regression, simple variance component analysis and variance component 
analysis using random coefficients. In the next section on the results of our 
analysis, we apply these techniques to the complete data set, using more 
complex models . 

Model 1: Ordinary Repression (OLS) 

In the present analysis, for a data set obtained by listwise 
deletion with respect to a set of variables considered below (a procedure that 
leaves 3,136 pupils in 88 schools), we have for the simple ordinary regression 
of posttest (YROT) on pretest (XROT), as per equation (1) with a single 
explanatory variable, 



y.j - a + /3x tj + Cij 



(10) 



and 



YROT - 4.892 + .818 XROT . ( u ) 
(.015) 
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In this model, identification of pupils within schools is completely 

ignored; instead, the pupils are assumed to be a randomly drawn sample from 

the population of all pupils in the given grade in the country. A pupil with 

a given pretest score XROT is expected to score 4.892 + . 818XROT on the 

posttest. The standard errors for the regression estimates will be given 

throughout the paper in parentheses in the line below the regression 

parameters. For example, .015 above is the standard error for the regression 

coefficient on XROT, .818. The corresponding t-ratio is .818/. 015 - 54.5. 

* 

The computation of follows: 

a 2 raw - 82.80 

a 1 - 42.56, 
so that R 2 - 1 - a 2 /a 2 raw - 1 - (42.56/82.80) - .486. 

Model 2: (Simple^ Variance Component Model (VCS> 

To take into account the group-level variables, we choose a simple 
variance component model ("simple" in that it does not contain variable 
slopes) : 

Y ij - M + dj + e tj (12) 
a 2 raw - 55.56 
r 2 raw - 25.65. 

The variation in posttest scores has a substantial group-level 
component. That is, the "total" variance is 81.21 (55.56 + 25.65), of which 
.316 (25.65/81.21), the variance component ratio, is attributed to group-level 
effects. The variance component regression model is given as: 
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YROT - 5.841 + .699 XROT (13) 
(.018) 

a 1 - 38.55 
r 2 - 4.78, 



so that we have R 2 - 1 - (43.33/81.21) - .466, and 

Rp 2 - 1 - 38.55/55.56 - .306 
R 2 - 1 - 4.78/25.65 - .814. 

& 

Thus, if we make allowances for the within-school correlation of the 
posttest scores, we obtain a prediction formula for the posttest score 
(YROT - 5.841 + .699XROT) that is substantially different from the OLS 
regression described in equation 11. Note, also, by how much the school-level 
variation has been reduced. 

Table 3 presents the comparison between the simple OLS and simple 

2 

variance component models. Clearly, the latter extension of the R for 
variance components is more informative. The pretest score XROT is a powerful 
predictor of the posttest score YROT. However, whereas in explains more than 
80% of the variation among the groups, it explains only 30% of the pupil-level 
variation. The school-level variation in the outcome scores reflects the 
pretest score to a great extent. Sume of the remaining within-group variation 
may be explained by the other explanatory variables, but they are not likely 
to have as dominant an effect as the pretest score does. 
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The variation associated with the testing and scoring procedure, 
which could be demonstrated in an experiment with repeated administration of 
the test, use of alternate forms, etc., will remain as a component of the 
pupil-level variation. Thus, whereas the group-level variation can 
potentially be reduced to 0, the pupil-level variation has a component that 
cannot be explained by any explanatory variables. In ideal circumstances (and 
in our case, almost), we can explain completely why /how schools vary; the 
variance of schools in the later models is very small. We cannot, however, 
explain the pupil-level variation completely; there will always be an 
unexplainable wi thin-pup 11 variation because of fluctuations in performance, 
distractions, guessing and so on. Since every pupil provides only one outcome 
score, the within-pupil and within-group variation cannot be separated. 

The raw variance component ratio is .316, but with the model with 
the pretest score, the ratio drops to .110. If the pretest score is ignored, 
the groups appear to have substantial differences. At the same time, the 
schools appear to be much more similar (homogeneous) onco we take account of 
the pretest scores, i.e., they are much more similar in the way they "convert" 
initial ability into outcome. 
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Table 3 : Comparison of OLS and VCS Models 
of Grade 8 Mathematics Posttest Predicted from the Pretest, 

Thailand, 1981-82 



Method 

Models OLS VCS 



Empty model 



a 2 82.80 55.56 

° raw 

_2 - 25.65 

T raw 



Regression model 



Intercept 
Coefficient 

St. errsr coeff. 0.015 0.018 



4.892 5.841 
0.818 0.699 



al 42.56 38.55 

f 2 _ 4.78 

r2 0.486 

„ 2 _ 0.306 

n 2 _ 0.814 



If a group-level explanatory variable were added to the regression 
model, it would result in a reduction of only the group-level variance, which 
has already been substantially reduced. Therefore there is less scope for 
important group-level explanatory variables than for pupil-level ones. Among 
the pupil-level variables there might be ones that explain a great deal of the 
remaining pupil-level variation. 

Inclusion of a pupil-level variable in the regression model will 
cause a reduction in both the pupil- and group-level variances. The relative 
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sizes of the reductions of the two variances will depend on how the variation 
in the explanatory variable decomposes into between- and within-group 
variance. Hence, potentially the most important pupil-level explanatory 
variables are those with little between-group variation. 

Model 3t Variable Slopes Mnrfpl 

The variance component model discussed above can be further 
generalized into a model that allows variable slopes on the pretest: 

Y ij " + 01 x ij + d oj + d lj< x ij " x > + °iy (14) 

where (d Q j , djj ) form a random sample from a normal distribution with a mean 
of zero and an unknown variance, Z d ; x is tne sample mean for x; and e's are a 
random sample from a normal distribution with a mean of zero and an unknown 
variance, a 2 . The maximum likelihood estimates for this model are: 

0 O - 5.832 

fil - .687 (.019) 

a 2 - 38.367 

2 d - Var (dQ.dj^ - 4.947 

.0805 .00416 . 

The software VARCL used for maximum likelihood estimation in variance 
component models estimates the square roots of the variances in 
S d and produces standard errors for these estimates: 
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s d,ll " 2 ' 224 ( ' 202) 
2 d,22 " - 0645 (- 0338 ) 
s d,12 " ,0805 (' 0311) 



Model 4: Comparison of the Models 

Now we test Model 3 against Models 2 and 1. First, we compare Model 
3 and Model 2. The value of the deviance (-2 log-likelihood) 6 -/ is 20,496.3. 
Using the conventional t-ratio, we conclude that the slope-variance 2 d2 2 is 
not significantly different from 0, so that we can adopt the simple variance 
component model. 

More formally, we can use the likelihood ratio test to compare the 
two variance component models. The deviance for the simple Model 2 is 
20,499.9, 3.6 times higher than in the case of the variable slopes Model 3. 
To determine the significance of this difference, it is necessary to determine 
the number of degrees of freedom from the "free" parameters. The simpler 
model is obtained from the latter model by constraining to zero the slope 
variance S d 22 and the slope-by-intercept covariance £ d) i 2 ; tnese are the two 
additional free parameters that set the degrees of freedom equal to 2. Hence 
the statistic x 2 has 2 degrees of freedom, and we can declare that we have 
found insufficient evidence for a variable slope of the posttest on the 



V This statistic is used to assess how well the model represents the data. 
For two models where one is a special case of the other, the differences of 
their deviances has a chi-square distribution, with the number of degrees of 
freedom equal to the difference in the number of free parameters in the two 
models . 
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pretest among the schools. That is, the schools are fairly uniform in theii 
conversion of pretest scores into posttest scores. 

Next we compare the simple variance component model (Model 2) with 
the ordinary regression model (Model 1). The differences among the schools, 
described by the variance in the simple variance component model, are 
substantial and statistically significant; the formal likelihood ratio test 
for the hypothesis that > 0 is obtained by comparing the deviances of the 
ordinary regression and the simple variance component models. The ordinary 
regression deviance (-2 log-likelihood, which is not the same as the residual 
sum of squares) is equal to 20,662.6, 162.6 higher than the deviance for the 
simple variance component model (x w:'.th 1 degree of freedom). Therefore we 
reject the ordinary regression model in favor of the variance component model. 
Further, the t-ratio for r is large. 

Making inferences about relationships that vary from group to group 
is of substantive importance in studies of school effectiveness. Schools are 
expected to vary in their performance after accounting for differences in the 
initial ability of the pupils, but other more complex patterns of 
between-school variation may arise: schools may be relatively more successful 
in teaching children with certain background characteristics, and they may 
either exaggerate or reduce the differences among the pupils at enrollment. 

The relationships among variables are intimately connected with 
variance heterogeneity. By way of illustration, we consider the variable 
slope model discussed above . The fitted variance of an observation is 

38.367 + 4.947 + 2*(XR0T - 8 . 912)* . 08054 (15) 
+ (XROT - 8.912) 2 *. 00416. 
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It is a quadratic function of the pretest. The minimal variance occurs for 
XROT* - 8.912 - .0805/. 0042 - -10.45 and is equal to 41.75. Only two pupils 
in the whole sample have scores lower than XROT*. Larger values of the 
explanatory variable XROT are associated with larger variance. For XROT - 9 
(near the mean), the fitted variance is 43.33, and for XROT - 30 (near the 
sample maximum), the fitted variance is 48.56. It would appear that for 
low-ability pupils, the choice of school is slightly less important than for 
high-ability pupils. We have to bear in mind, however, that we are dealing 
with an observational study, not. with an experiment, and in reality pupils, or 
their parents, do not have complete freedom of choice over the school. Thus a 
causal statement, or a prediction about a future manipulative procedure, can 
be made only under the condition that all the other circumstances in the 
educational system remain intact. This assumption is usually very 
unrealistic . 



Summary 

The comparison of the regression relationship (fixed effects) is 
instructive. We have 



(i) Ordinary regression 

YR0T - 4.892 + .818*XROT 
(.015) 

(ii) Simple variance component model 

YR0T - 5.841 + .699*XROT 
(.017) 
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(ill) Variable slopes 

VROT - 5.832 + .687*XROT. 

(.019) 

The estimate of the regression coefficient on XROT in ordinary regression is 
substantially different from the estimates in the two variance component 
models. Ignoring the hierarchical structure of the data would lead to 
different conclusions, say, in predicting the posttest (YROT) from the pretest 
(XROT). In other words, whereas the OLS estimate could be interpreted to 
mean that each point on the pretest is worth .82 points on the posttest, the 
VCS estimate more accurately places this value at .69 points. 



' CHAPTER IV: PUPIL BACKGROUND AND SCHOOL/CLASSROOM EFFECTS ON LEARNING 
Overview 

In this section we use the complete data set to estimate the 
effects of student background and school/classroom variables on achievement in 
mathematics. The approach taken is often referred to as a "value-added" 
approach, since the purpose is to explain posttest. achievement after the 
effects of prior learning (pretest achievement) have been taken into account. 
Our intent is to obtain the most parsimonious simple variance component model 
of grade eight mathematics learning in Thailand, given the data. 

Because of missing data, we build the model conservatively, as 
follows. First, wo start with the data set obtained by listwise deletion with 
respect to all 32 variables (including the outcome YROT and the pretest XKOT) , 
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fit a regression model to this data sec, and apply a conservative criterion 
(to be specified below) to exclude variables from the obtained regression 
formula, so that we end up constructing a restricted set of explanatory 
variables. We apply listwise deletion to this restricted set of variables, a 
process that leads to a larger sample of pupils and schools. For this new 
data set, we again fit the regression model, simplify the regression formula, 
if possible, and continue .n until no further reduction of the set of 
variables end extension of the data set obtained by listwise deletion are 
possible . 

Usually it cannot be assumed that the unavailable data are missing 
at random, I.e., the distribution of a variable among the pupils from whom we 
obtain valid responses is similar to the distribution among the pupils whose 
responses are not available (mi,sing). In educational surveys, typically 
higher ability pupils, those with higher social status, etc., tend to have 
higher response rates, the implication being bias in the estimates of certain 
population means, as well as in the regression coefficients obtained from 
simple regression. Missingness at random is an unnecessarily stringent 
criterion for ensuring that the omission of the subjects with missing data has 
no effect on the results of a regression analysis. It is sufficient to have 
conditional randomness, given the explanatory variables. It means that for 
any combination of explanatory variables, the distribution of the outcome 
among the pupils in the sample is identical to that for those excluded from 
the sample by the listwise deletion procedure. Intuitively, such an 
assumption becomes less stringent the more explanatory (conditioning) 
variables are used. On the other hand, a larger set of explanatory variables 
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implies a larger proportion of subjects whose data are not used in the 
analysis . 

An indication of the extent to which the criterion of conditional 
randomness is relevant can be deduced from comparisons of model fits fo:r two 
different samples: the maximal sample obtained by listwise deletion with 
respect to the set of explanatory variables used in the considered model, and 
the sample obtained by listwise deletion with respect to a more extensive, or 
complete, set of explanatory variables. In a few such comparisons, reported 
below, we find close agreement in several pairs of such analyses. 

M ultiple Regression Models 

The response rate for the 13 pupil-level variables is between 
93-100%. There is no obvious pattern of missingness among the pupils; 
complete pupil-level records are available for 3,466 individuals (86%). The 
group-level data are available for between 78-99 schools, but only 60 schools 
have complete recordi , and within these schools, only 2,076 pupils also have 
complete pupil-level data (51.5%). We begin by fitting the simple variance 
component models (VCS) ; i.e., models involving no triable slopes, to the data 
set. 

First model: Repression with all variables . Listwise deletion with 
respect to all 32 available variables results in a data set containing 2,076 
pupils in 60 schools. The ordinary regression fit (0LS) of the posttest on 
the pretest is 

YROT - 4.882 + .817*XROT, a 1 - 42.20, 
(.017) 
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which is in close agreement with the OLS fit reported above for the larger 
data set (3,136 pupils in 88 schools). The corresponding simple variance 
component model fit is: 

YROT - 5.670 + . 720*XROT 
(.020) 

a 1 - 38.79 
t 2 - 4.02. 

Compared to the larger data set, equation 13, we find some 
discrepancies: the fitted regression slope for the smaller data set is higher 
(.720 versus .699) and the group-level variance is smaller (4.02 versus 4.78). 
The variation of the slope on XROT is not significant in either sample, but it 
is two-and-a-half times as great in the larger data set (.00416) than in the 
smaller one (.00166). It appears that the 28 schools added to the data are 
more likely to have lower regression slopes and contain proportionately mere 
schools at the extremes (very "good" or very "bad"), because the larger sample 
has larger group-level variance, r 2 . We emphasize that all these differences 
may arise purely by chance, rather than as a result of non-random missingness 
of the data, but they can have a substantial effect on the inferences drawn. 

The OLS and VCS model estimates for the 2,076/60 data using all the 
explanatory variables are given in Table 4. The dominant explanatory power of 
the pretest score XROT is obvious, as evidenced not only by the t-ratio for 
its regression coefficient (32.38 for OLS and 30.80 for VCS), but also by the 
comparison of the variance component estimates across models. The raw 
variance component estimates are: 

" 2 raw " 57 • 30 
r 2 raw - 28.83. 
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Table 4: OLS and VCS Model Estimates for 2,076 Students and 
60 Classrooms/Schools Using All 31 Explanatory Variables, 

Thailand, 1981-82 



QLS VCS 

Variable Estimate St. Error Estimate St. Error 



Student Level 



GRAND MEAN 


18.603 


— 


19.717 


- 


XROT 


.680 


.021 


.647 


.021 


XAGE 


-.080 


016 


- 077 




XSEX 


.732 


.301 


.969 


.319 


YFOCCI 


.174 


.431 


.033 


.434 




-.631 


.462 


-.646 


.460 




-.178 


.541 


-.239 


.542 


YMEDUC 


.021 


.327 


-.039 


.325 




-.129 


. 562 


- 157 


556 




-.686 


.661 


-.899 


.663 


HCALC 


-.120 


.310 


-.217 


.309 


YHLANG 


.203 


.315 


.012 


.341 


YMOREED 


1.087 


.546 


1.074 


.541 




1.570 


.545 


1.537 


.541 




1.638 


.593 


1.610 


.589 


YPARENC 


.225 


.137 


.249 


.136 


V PERCEV 


-.980 


. 160 


-1.020 


. 161 


YFUTURE 


.574 


.168 


.526 


.167 


YDESIRE 


.277 


.236 


.228 


.233 


Group Level 










gpciSl 


.061 


.042 


.073 


.060 




.422 


.263 


.417 


.386 


sstream 


-.426 


.358 


-.500 


.512 


sdavsvr 


-.006 


.020 


-.010 


.029 




-.152 


.051 


-.170 


.075 


saualmt 


1.023 


.342 


1.029 


.494 


tedraath 


-.035 


.037 


-.044 


.053 


tsex 


-.580 


.336 


-.619 


.481 


£A£ft 


.009 


.032 


-.001 


.046 


texptch 


.014 


.043 


.038 


.064 


tnstuds 


.035 


.018 


.039 


.025 


tmthsub 


1.725 


.432 


1.941 


.628 


txtbook 


1.602 


.338 


1.650 


.490 



(continued) 
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Variable 






OLS 






vcs 


Estimate 


Si. 


Error 


Es t iroatf? 


Sc. Error 


cefeed 




148 


.203 




"*"^ ■ 

.209 


.290 




-1 


. i J* ( 


.218 


-1 


124 


.314 


tvi*i;nat 






. 331 




461 


. 480 


tadniinl 


_ 


003 


.004 




.003 


.006 


tordc /; \ 




037 


.012 




.039 


.016 






Oil 


. 005 




.011 


. 007 


Variance 


38 


.031 


6.167 






- 


Pupil-level 


variance 






36 


.809 




Pupil-level 


sigma 






6 


.067 




Group-level 


variance 






1 


.317 




Group-level 


sigma 






1 


.148 


0. 192 


Deviance 








13424 


.947 





The pretest score XROT on its own leads to a reduction of these variances to 
38.79 (Rp 2 - 32%) and 4.02 (R g 2 - 86%). However, the other 30 variables 
reduce the pupil-level variance only raargi lly to 36.8 (Rp 2 - 36%). The 
group-level variance is almost saturated — 1.32 (R 2 - 95.5%). It appears that 
we have abundant information about the groups, but we are less successful with 
an explanation, or suitable description, of the pupil-level variation. 

The relatively large number of group-level variables raises a 
concern about raulticollinearity , i.e., competing alternative descriptions of 
the data. To deal with this problem we apply a conservative criterion for the 
exclusion of explanatory variables from our models. We regard a variable as 
not "important 11 for the fixed part of the VCS model if the t-ratio of its 
regression coefficient is smaller than 0.9 at the first stage of model 
reduction and 1.0 thereafter. In the first round of simplifying the model, we 
use the 0.9 criterion to exclude two pupil-level social class variables 
(calculator in the home [YHCALC] and use of the language of instruction in the 



ERIC 



44 



36 

home [YHLA1TG]) and six group-level variables: four indicators of resource 
inputs (number of days in the school year f sdavsvrl . teacher's pos tsecondary 
mathematics education f tedmathl , teacher's age \ taee l . and teaching experience 
f texptchl ) and two teaching process variables (frequent use of individual 
feedback f cefeed ) and time spent in routine administration [ ta dmint 1 ) from the 
full list of 31 variables. 

Second model . Next we estimate both the OLS and VCS models using 
this shorter list of 23 variables. The results are shown in Table 5. 
Exclusion of the eight variables (eight degrees of freedom) has virtually no 
effect on the retained regression parameters and their standard errors 
(compare Tables 4 and 5) ; the exception is an indicator of instructional 
materials (use of commercial visual materials [tvismat]), which now fails to 
meet the inclusion criterion. The increase in the variance components is only 
marginal, in particular for the group-level variance. The difference in 
deviances is 3.3 (x g) • 

Again we obtain the largest data set obtainable by listwise deletion 
with respect to the retained variables; this procedure yields data for 2,^04 
pupils in 80 schools. We then compute the variance component analysis for 
this data set; the results are given in Table 6. We see that the regression 
coefficients for the pupil-level variables are stable across the data sets (as 
compared with Tables 4 and 5), but the discrepancies for the group-level 
variables are substantial. There are two separate, but possibly 
complementary, explanations for these discrepancies: multicollinearity and 
non-random missingness of data. Multicollinearity would cause the regression 
estimates to be sensitive to changes in the data, in our case to the inclusion 
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Table 5: OLS and VCS Model Estimates for 2,076 Students and 
60 Classrooms/Schools Using 23 Explanatory Variables, 

Thailand, 1981-82 



Variable 



Estimate 



St. Error 



VCS 



Estimate 



St. Error 



Student Level 



GRAND MEAN 

XROT 

XAGE 

XSEX 

YFOCCI 



18.118 
.685 
-.080 
.723 
.118 
-.621 
-.139 



.020 
.016 
.299 
.426 
.457 
.538 



18.370 
.650 
-.076 
.958 
.033 
-.651 
-.212 



021 
016 
.318 
.432 
.457 
.541 



YMEDUC 



.037 
-.068 
-.604 



.326 
,559 
.656 



-.028 
-.115 
-.855 



325 
555 
660 



YMOREED 



1.115 
1.568 
1.666 



545 
543 
591 



1.083 
1.521 
1.609 



540 
540 
589 



YPARENC 
YPERCEV 
YFUTURE 
YDESIRE 
Group Level 

snci81 
sengolt 
sstream 
sputear 

sqvfllrot 

tsex 

tnstuds 

tmthsub 

tXtfrk 

tworkbk 

tvismat 

torderl 

tseatl 



.238 
-.970 
.570 
.287 



.050 

.509 
-.441 
-.178 
1.062 
-.518 

.036 
1.802 
1.649 
-1.02: 

.368 
-.040 

.010 



Variance 38.108 
Pupil-level variance - 
Pupil-level sigma 
Group-level variance - 
Group-level sigma 
Deviance 



.137 
.160 
.168 
.235 



.038 
.251 
.324 
.046 
.327 
.314 
.017 
.409 
.315 
.204 
.322 
.010 
.005 

6.173 



.255 
-1.010 
.526 
.234 



.058 

.540 
-.503 
-.198 
1.090 
-.536 

.038 
2.094 
1.673 
-1.039 

.393 
-.043 

.011 



36.855 
6.071 
1.351 
1.162 
13428.295 



135 
161 
167 
233 



.056 
.373 
.472 
,068 
.480 
,460 
,025 
.604 
,463 
.300 
.473 
.014 
.007 
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of over 700 new observations. As an alternative, the discrepancies could 
arise as a result of the non-random rrissingness in our data, i.e., if the two 
data sets have genuinely different regression characteristics. A suitable 
indication, although not a fool-proof check, for the latter possibility is 
obtained by fitting the models with identical specifications for the different 
"working" data sets. We have fitted the reduced second model (Table 5) to the 
larger data set (Table 6), and although we obtained different values for the 
group-level regression coefficients, it turns out that the reduced list of 
variables also provides an adequate description for the data (as judged by the 
likelihood ratio criterion) . The pupil-level regression coefficients differ 
only marginally. 

We conclude, therefore, that raulticollinearity is the more likely 
cause of the discrepancies in the estimates: we have too many group-level 
variables, so that the parameter estimates are subject to large fluctuations 
when small changes are made in the data. The explanatory variables provide 
sufficient conditioning for the outcome data to be missing at random, given 
the available explanatory variables. 

In keeping with According to our exclusion criterion (t ratio < 1), 
we now delete from the fixed part of the model six group-level variables. 
Four are conventional material and non-material input variables (district 
level per capita income f spci81 1 , teacher gander [ tsex l , class size [ tnstuds] , 
and use of commercial visual materials f tvismat l ) and two are organization and 
process variables (student time doing seatwork [ tseatl l and ability grouping 
f sstreaml ) . 
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Table 6: OLS and VCS Model Estimates for 2,804 Students and 
80 Classrooms/Schools Using 23 Explanatory Variables, 

Thailand, 1981-82 



QLS V£S_ 

Variable Estimate St. Error Estimate St. Error 



Student Level 



GRAND MEAN 

XROT 

XAGE 

XSEX 

YFOCCI 



17.659 
.699 
-.079 
.746 
.197 
-.403 
.089 



.017 
.014 
.251 
.363 
.389 
.458 



17.314 
.634 
-.073 
1.103 
.101 
-.458 
.085 



.019 
.014 
.271 
.367 
.386 
.458 



YMEDUC 



.306 
.088 
•.018 



279 
465 
567 



.293 
.142 
-.309 



.276 
.458 
.566 



YMOREED 



.861 
1.086 
1.617 



.476 
,475 
.519 



.786 
1.015 
1.542 



,467 
,468 
,512 



YPARENC 
YPERCEV 
YFUTURE 
YDESIRE 



.388 
-1.083 
.576 
.493 



118 
137 
142 
201 



.375 
•1.131 
.533 
.439 



116 
136 
141 
198 



Group Level 

spci81 

senrolt 

s stream 

sputear 

squalmt 

tsex 

tnstuds 

tmthsub 

tvorkbk 
tvismat 
torderl 
tseatl 



-.029 

.437 
-.417 
-.095 

.698 
-.038 

.012 
1.836 

.948 
-0 . 500 

.353 
-.024 

.005 



Variance 37.949 
Pupil-level variance - 
Pupil-level sigraa 
Group-level variance - 
Group-level sigraa 
Deviance 



.033 
.187 
.275 
.032 
.246 
.266 
.014 
.344 
.266 
.167 
.269 
.008 
.004 

6.160 



-.025 
.481 

-.422 

-.110 
.784 
.014 
.020 

2.398 
.978 

-.499 
.363 

-.027 
.006 



35.868 
5.989 
2.285 
1.512 
18088.395 



.057 
.331 
.473 
.058 
.429 
.463 
.023 
.593 
.461 
.291 
.468 
.013 
.006 



0.174 



0 
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Third model . As before, we estimate this model with both the 
smaller and larger data sets. The estimates from the OLS and VCS models using 
the former reduced list of variables are given in Table 7; the same schools 
and pupils are involved as for Table 6. For the latter, larger data set of 
3,025 students in 86 schools, we fit the reduced model (17 variables) and 
present the results in Table 8. Again, the difference in deviances (3.5, x& ) 
is small. The effects of non-random raissingness can be checked by comparing 
the estimates in Tables 7 and 8. Applying our exclusion criterion to the 
variables in Model 3, we find that no further reduction of the list of 
explanatory variables is possible. 

Note that, because of the relatively small number of schools, the 
appropriate conclusion about the 14 group-level variables we deleted is that 
"we found insufficient evidence" of a systematic effect of these variables, 
rather than "our analysis disproves their effects." Further, a different 
mod' lling scheme could lead to a different "minimal" set of important 
explanatory variables. Because of collinearity , there may be a set of 
alternative regression formulae that give a model fit that is not 
substantially inferior co the one given in Table 8 in terms of the deviances. 

A summary of the results of these analyses is provided in Table 9. 
In all the models, student background characteristics are important 
determinants of mathematics learning over time. School-level resources also 
appear to have an important impact on achievement, with students in the larger 
schools learning more than students in the smaller schools and students in 
schools with a higher percentage of teachers qualified to teach mathematics 
learning more than students in schools with a lower percentage of qualified 
teachers; however, students in the schools with a higher student/teacher ratio 
also learned more. 

49 
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Table 7: OLS and VCS Model Estimates for 2,804 Students and 
80 Classrooms/Schools Using 17 Explanatory Variables, 

Thailand, 1981-82 



OLS 



VCS 



Variable 



Estimate 



St. Error 



Estimate 



St. Error 



Student Level 

GRAND MEAN 

XROT 

XAGE 

XSEX 

YFOCCI 



YMEDUC 



YMOREED 



YPARENC 
YPERCEV 
YFUTURE 
YDESIRE 



17.321 
.704 

-.077 
.676 
.181 

-.419 
.105 

.293 
.112 
.014 

.869 
1.128 
1.666 

.393 
-1.076 
.592 
.477 



.017 
,014 
,247 
.357 
.387 
.455 

.280 
.465 
.563 

.476 
.476 
.520 

.117 
.137 
.142 
.201 



17.694 
.635 
-.073 
1.086 
.085 
-.465 
.082 

.288 
.154 
-.297 

.786 
1.027 
1.560 

.377 
-1.130 
.537 
.431 



.018 
.014 
.270 
.365 
.385 
.457 

.276 
.458 
.564 

.467 
.468 
.512 

.116 
.136 
.141 
.197 



Group Level 

senrolt 
sputear 
squalmt 
tfflthsub 
txtbook 
tworkbk 
torderl 



.285 
-.074 

.808 
1.950 

.948 
-.433 
-.022 



Variance 38.065 
Pupil-level variance - 
Pupil-level sigraa 
Group-level variance - 
Group- level sigma 
Deviance 



.164 
.030 
.239 
.329 
.259 
.160 
.006 

6.170 



.367 
-.094 

.880 
2.562 

.946 
-.402 
-.024 



35.871 
5.989 
2.429 
1.558 
18091.983 



.289 
.054 
.427 
.576 
.458 
.284 
.010 



0.176 
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Table 


8: OLS and VCS 


Model Estimates 


for 3,025 Students 


and 


86 Classrooms/Schools Using 17 Explanatory Variables, 








Thailand, 1981- 


82 






OLS 


VCS 




Variable 


Estimate 


St. Error 


Estimate St. 


, Error 


Student Level 








GRAND MEAN 


17.238 




17.536 




XROT 


.695 


ni 7 
. Ul / 


.629 


. 018 


XAGE 


-.075 


m l 
. u m 


-.071 


.014 


XSEX 


.658 


.238 


1.053 


.260 


YFOCCI 


.152 


.343 


.074 


, 351 




-.415 


. J / J 


-.435 


. 373 




1 1 K 
. 1J.J 


A A 7 


. 123 


.446 


YMEDUC 


.371 


269 


.343 


1 C C 


- 


.056 


.449 


.073 


A /. 0 




.066 


.554 


o c n 

-.259 


,555 


YMOREED 


.854 




.755 


/CO 

. 4 j j 




1.195 




1.064 


/CO 

,452 




1 . /U j 




1.532 


.494 


YPARENC 


.361 


. L i. J 


.347 


.112 


YPERCEV 


-1.140 


• 1. J £ 


-1 . 191 


.132 


YFUTURE 


.614 


.137 


C / T 

. 543 


.136 


YDESIRE 


.484 


.194 


ACQ 


i on 
. 1VU 


Group Level 










senrolt 


.271 


160 


.350 


0 7 Q 

. i / y 


SDUtear 


-.076 


.029 


-.094 


. U 


squalmt 


.847 


.232 


. 903 


.410 


tmthsub 


1.968 


.327 


1 C A £ 

z . j4o 


.566 


txtbk 


1.047 


.250 


1.0/1 


. 4 J 7 


tworkbk 


-.434 


.157 


/IT 

-.417 


. Z / J 


torderl 


-.023 


.006 




. U1U 


Variance 


38.271 


6.186 






Pupil-level 


variance - 




36.138 




Pupil-level 


sigma 




6.012 




Group-level 


variance - 




2.353 




Group-level 


sigma - 




1,534 


.169 



Deviance 



19537.962 
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Classroom variables also affect achievement. Students in non- 
remedial classes learned more than students in remedial classes; students in 
classes where the teacher used textbooks more often learned more than students 
in classes in which textbooks were not used. On the other hand, workbooks and 
teacher time spent maintaining order were negatively related to learning. 

Table 9 : Summary of Tables 



Tables 



4 5 6 7 8 



OLS variance 


38 


.03 


38.11 


37 


.95 


38.07 


38.27 


St. error 


6 


.17 


6.17 


6 


.16 


6.17 


6.19 


VCS pupil-level variance 


36 


.81 


36.96 


35 


.87 


35.87 


36.14 


Sigma 


6 


.07 


6.08 


5 


.99 


5.99 


6.01 


VCS group-level variance 
















For G. mean 


1 


.32 


1.35 


2 


.29 


2.43 


2.35 


Sigma 


1 


.15 


1.16 


1 


.51 


1.56 


1.53 


St. error for sigma 


0 


.19 


0.19 


0 


.17 


0.17 


0.17 


Sample size 
















Pupils 


2, 


076 


2,076 


2, 


804 


2,804 


3,025 


Groups 




60 


60 




80 


80 


86 



Several researchers have considered the contextual effects in 
educational studies involving multi-level data (see Rauder. v ;.sh and Bryk 1986) . 
In our case, contextual analysis would involve using within-school means of 
pupil-level variables as school-level variables. However, as was pointed out 
earlier, we have abundant school-level information (14 school-level variables 
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for 99 schools), and contextual analysis would only aggravate further the high 

level of confounding of the school-level variables. Contextual variables are 

more relevant in studies where the aim is to produce, or at least consider, a 

ranking of schools. The ranking may depend crucially on the explanatory 

variables used and can often be affected by even the inclusion of variables 

with statistically insignificant regression coefficients. This point 

highlights the need to select models based on educational theory rather than 

on purely statistical criteria that contain a great deal of arbitrariness. 

■ 

Modelling of Group-Level Variation (Random Slopes a nd Random Differences) 
Simultaneously with reducing the fixed (regression) part of the 
variance component model for our data, we also need to explore extensions of 
the random part to obtain a better description of the group-level variation 
than the one offered by the group-level variance. We concentrate first on a 
reduction of the fixed part to a shorter list of explanatory variables 
because: (i) the school-level variation is rather small and (ii) in the models 
with complex descriptions of variation, the estimates of fixed effects and 
their standard errors differ very little from those obtained so far (Table 8). 

In the variance component models fitted so far (Tables 4-8), the 
wi thin-group regressions are assumed to be constant across groups, with the 
exception of the intercept (position), which has a fitted variance of 2.35. 
More generally, the regression coefficients with respect to any of the 
pupil-level variables may be allowed to vary across the groups. These 
variables, selected from the variables included in the fixed part, form the 
random part of the model. The group-level variables are not considered for 
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the random part, because within-group regressions with respect to such 
variables cannot be identified. 

Variance component models closely resemble the models for the 
analysis of covariance. The simple variance component models correspond to 
ANOCOVA models, with no interactions of covariates with the grouping factor. 
The (complex) variance component models, with variable within-group regressions 
(slopes and/or differences) correspond to ANOCOVA models with group x 
covariate interactions. The difference between the variance component and 
ANOCOVA models is in their emphasis on the description of variation as opposed 
to differences among the groups and in the assumption of the normality of the 
group effects in the former. The model specification in both models is 
analogous : 

, a, list of covariates (fixed part), 

b, sublist of covariates that have interactions with the grouping 
factor (random part). 

We now turn to modelling the random part. For a continuous variable 
included in the random part, the within-group regression slopes with respect 
to this variable are assumed to vary randomly (and to be distributed normally) 
with an unknown variance. For a categorical variable included in the random 
part, the within-group (adjusted) differences among the categories are 
normally distributed. We can consider the "stereotypical" group, for which 
the regression is given by the fixed part model (the average regression) , 
with the regressions for the groups varying around this average regression. 
The deviations of the regression coefficients form a random sample (i.i d.) 
from a multivariate normal distribution. The components of the vec^c " of 
deviations (for a group) cannot be assumed to be independent; thus, their 
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covariance structure has to be considered. However, the variances of these 
deviations (or random effects) are the main interest. 

Data with only a moderate number of groups and with limited numbers 
of subjects within groups (classroom sizes), as is the case in this analysis, 
contain only limited information about variation, comparable to the limited 
information about interactions in models of analysis of covariance. Usually, 
information about the covariance stmcture is even scarcer. Therefore, if 
many variances are included in the random part (and estimated as free 
parameters), we can expect high correlations among the estimates — large 
estimated variances with large standard errors. Moreover, the number of 
covariances. to be estimated grows rapidly with the number of variances, and 
many of the estimated correlations corresponding to these covai iance* are then 
close to +1 or -1. The variance matrix with these variances and covariances 
is not of full rank, and the random effects are linearly dependent. Therefore 
it is important to adhere to the principle of parsimony and seek the simplest 
adequate description for group-level variation. In selecting the covariances 
to be estimated, we use the guidelines set by Goldstein (1987) and Longford 
(1987) . 

Although selection of a model for the random part involves only 
pupil-level variables ( inclusion/e..-lusion) , it is more complex than the 
selection for the fixed part because constraints can also be imposed on the 
covariances. The most general variance component model would involve 17 
variances (the number of regression parameters in Table 8) and 17*16/2 - 136 
covariances. Fitting such a model is clearly not a realistic proposition. 
Thus, model selection has to proceed by building up the random part from 
simpler to more complex models. The models fitted are all invariant with 
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respect to the choice of the location of the explanatory variables. In the 

computations, all the variables are centered around the overall mean, and the 

estimated variance matrix refers to this "centered" pararaetrization. However, 

the variance matrix for a different pararaetrization is easy to calculate by a 

quadratic transformation. 

In selecting the model for the random part, we proceed according to 

the following stages. For all the models we use the same fixed part as in 

Table 8. The estimates and standard errors for the regression parameters 

« 

differ very slightly from those in Table 8 for all these models. This fact 
justifies post hoc our approach of first settling the fixed part and then 
modelling fctoe random parts. First we fit models with one pupil-level variable 
in the random part. Using the likelihood ratio test to compare the fitted 
model to the model with the simple random part (Table 8) , we select the 
following variables: pretest score (XROT) ; age (XAGE) ; motivation (YDESIRE) ; 
and educational expectation (YMOREED) . 

The first three variables are ordinal and are associated with one 
variance each. The likelihood ratio (the difference of the deviances) for 
each of the three corresponding models is larger than 3. This criterion is 
intentionally very conservative, since we prefer to err on the side of 
inclusion. Two parameters are involved — a variance (slope-variance) and a 
covariance (slope-by-intercept covariance) — but they are not free 
parameters, since they have to satisfy the condition of positive def initeness . 
The distribution of the difference of the deviances is x^2 on ^y ^ 
correlation corresponding to the covariance is not equal to +1 or -1. The 
problem of negative variances is resolved by estimating the square roots of 
the variances (sigmas) . In the actual computational algorithm, negative 
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sigmas do not arise, and the estimated variance matrix is always non-negative 
definite. 

Next we fit the VC model with these four variables in the random 
part and simplify the random part by excluding variables and setting certain 
covariances to 0. The variance associated with the variable XAGE is very 
small (.00095), and its square root has a low t-ratio (.75), so that it can be 
constrained to 0 (excluded) . The implication is a constraint on all the 
covariances involving XAGE, which are also set to 0. The three remaining 
variables and the intercept are represented by a 6x6 variance matrix: 6 
variances and 15 covariances, almost as many parameters as are in the fixed 
part. The fitted variance matrix is: 
Intercept 2.581 



XROT 






.0143 


.00558 








YMOREED 


Cat. 


2 


.191 


.0388 


.812 








Cat, 


3 


.519 


.0439 


.0621 


1.032 






Cat, 


.4 


.384 


.0354 


- .0241 


.261 


1.032 


YDESIRE 






.0863 


-.0127 


-.307 


-.303 


- .346 



The decrement in deviance as compared with the VCS model (Table 8) is only 
13, a result that hardly warrants the addition of these 21 parameters in the 
model . 

The software used provides standard errors for the square roots of 
the variances (sigmas and diagonal elements of the matrix) and for the 
covariances. The sigmas and their standard errors are: 
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Intercept XROT YMOREED YDESIRE 

cat. 2 cat. 3 cat. 4 



Sigma 1.607 .0747 .901 1.175 1.016 .828 

St. error .176 .0261 .429 .451 .640 .295 



The standard errors for the covariances involving XROT and 
categories of YMOREED (rows 3-5 in column 2) are between .059 - .063 and for 
those involving YDESIRE and YMOREED (columns 3-5 in row 6) are .56 - .62. 
Since each of these covariances has a small t-ratio, they are constrained to 0 
in the next model. The following estimated variance matrix is obtained (the 
sigmas and their standard errors are given to the right of the variance 
matrix) : 



Variable Matrix Sigma St. Error 



Intercept 




2.237 










1.496 


.173 


XROT 




.0141 


.00343 








.0586 


.0317 


YMOREED Cat. 


2 


.199 


0 


.0230 






.152 


.639 


Cat. 


3 


.601 


0 


.0791 


1.490 




1.221 


.439 


Cat. 


4 


.^43 


0 


.003 


.392 


.826 


.989 


. 753 


YDESIRE 




.119 


-.0178 


0 


0 


0 .746 


.864 


.276 
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Exclusion of these six covariances leads to an increase in the deviance of 
only 1.8. The variance associated with the second category of YMOREED falls 
substantially, and it can also be constrained to 0, together with the three 
covariances in the same row and column of the variance matrix. Constraining 
these four parameters causes an increase in the deviance of only .2. The 
reestimated variance matrix is: 



Variables Matrix Sigma St. Error 

Intercept 2.415 1.554 .162 

XROT .0455 .00390 .0625 .0313 

YMOREED Cat. 2 0 0 0 0 0 

Cat. 3 1.136 0 0 1.788 1.337 .341 

Cat. 4 .740 0 0 1.157 1.424 1.193 .514 

YDESIRE .304 -.0436 0 0 0 .830 .911 .260 



The rank of this matrix is 4 (the two variance matrices given above 
are also singular) . Thus it appears that another variance parameter can be 
constrained to 0. However, the t-ratio for each of the sigmas is high, and 
only a complex linear reparametrization of the variables included in the 
random part would enable further simplication of the model. 

The variance matrix obtained provides a description of group-level 
variation in terras of 11 parameters, 5 variances arid 6 covariancss. However, 
the difference between the variances in this model and the corresponding VCS 
model is only 11 (for 10 parameters). That result provides further evidence 
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of overpararaetrization or collinearity in the random part. However, any 
attempt to define a suitable model with fewer parameters would necessarily 
involve some unnaturally defined variables, which would make interpretation of 
the model very difficult. We interpret these estimates as discussed below. 

The variation in the slope on XROT provides evidence of an unequal 
"conversion" of ability at the beginning of the year into ability at the end 
of the year. Such a conclusion is appropriate only subject to the caveats 
discussed in the summary chapter. The slope on XROT is shallower in some 
schools, where the initial differences in XROT tend to be associated with 
smaller differences in YROT than in schools where the slopes are steeper. 

The regression slope for YDESIRE is about .5, which is the 
regression slope for the "stereotypical" school, where every feature is 
"average." The variation associated with this regression slope has a 
standard deviation of .9; that is, there is a large (predicted) proportion of 
schools where the slope on YDESIRE is very small or even negative. The 
correlation of the wi thin-group slopes on XROT and YDESIRE is -.77: lower 
"effects" of motivation to succeed are associated with schools where the 
initial differences become exaggerated by the end of the year. 

H arianc.es associated with categories 3 and 4 of YMOREED 
(expectations to complete five or more years of schooling) represent the 
variation of the adjusted differences between categories 3 and 1 (expectation 
to complete fewer than two more years of education) and 4 and 1, respectively. 
While the fitted difference between categories 2 (two to four more years) and 
1 is about .8 and constant for all the schools, the average wi thin-school 
difference between categories 3 and 1 is 1.1. with a variance of 1.8. 
Therefore thir difference is negative in several schools. The situation with 
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the categories 4 and 1 contrast is similar, although the number of schools 
with the reversed sign of the difference is much smaller. The correlation of 
the random effects associated with categories 3 and 4 is .725; a high 3-1 
contrast is associated with a high 4-1 contrast; but the fitted variance for 
the contrast 4-3 is 1.79 + 1.42 - 2*1.16 - .89, whereas the average difference 
is 1.58 - 1.08 - .50. Hence there are schools where the pupils with 
YMOREED - 3 have lower adjusted scores on YROT than where YMOREED - 4, 
although on average the fourth category is .5 points ahead. 

The estimates of the regression parameters differ only marginally 
for the different specifications of the random part. This result justifies, 
post hoc, our approach of modelling first the regression part of the model and 
then the random part. The regression estimates for the last model considered 
are given in Table 10. 

Conditional Expectations of the Random Effects 

In the fixed-effects ANOVA or AN0C0VA, estimates of the effects 
associated with the groups are obtained. In variance component models, these 
effects are represented by random variables. Conditional upon the adopted 
model, the expectations of the (random) group-effects can be considered as the 
group-level residuals, or as "estimates" of the group-effects. These 
conditional expectations have to be inspected as to whether they conform with 
the assumptions of normality. This inspection involves a check for skewness 
and kurtosis (not carried out here, but visual inspection indicates no 
problems) and a check for outlying values of the effects. The latter check is 
obviously also of substantive importance because it would be useful to detect 
schools with exceptionally high or low performance, where the categories of 
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Table 10 : Fixed-effect Estimates for the Final Model with Random 
Effects for 3,025 Students and 86 Classrooms/Schools Using 
18 Explanatory Variables, Thailand, 1981-82 



Variable 



Estimate 



St. Error 



Student Level 



GRAND MEAN 

XROT 

XAGE 

XSEX 

YFOCCI 



16.642 
.617 
-.070 
1.143 
.101 
-.488 
.198 



020 
,014 
,260 
,352 
,374 
,446 



YMEDUC 
YMOREED 



.347 
.062 
-.491 
.816 
1.117 
1.618 



268 
,446 
,560 
,453 
,476 
.514 



YPARENC 
YPERCEV 
YFUTURE 
YDESIRE 



.358 
■1.178 
.526 
.480 



112 
133 
137 
217 



Group Level 



senrolt 
sputear 
soualmt 
tmthsub 
txtbook 
tworkbk 
torderl 
tseatl 



.300 
-.063 

.781 
2.632 
0.949 
-.372 
-.035 

.007 



.265 
.048 
.380 
.582 
.431 
.270 
.012 
.006 



Variance 

Pupil-level variance 
Pupil-level sigma 
Group-level variance 
Group-level sigma 
Deviance 

Number of iterations 



35.259 
5.938 

See matrix in the text 



19,064.902 
8 
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YMOREED have substantially different differences than do average schools, in 
which the outcomes are more/ljss influenced by the initial score XROT. 

The complex nature of the variation, involving ;,hree variables, 
coupled with the number of groups, makes it infeasible to discuss the 
deviations of the group-level regressions from the average regression. In 
fact, the main motivation for using variance component analysis has been to 
obtain a global description of variation, without reference to individual 
groups. The added advantage is that owing to the shrinkage property of the 
conditional expections, extreme results attributable to unreliability for some 
of. the schools with small numbers of students are avoided. The conditional 
expectations are a mixture of the pooled ordinary least squares solution and 
the within-group regression; the weight depends on the amount of information 
contained in the data from the group. Conditional expectations are obtained 
even for schools where the number of pupils in the data is smaller than the 
number of regression parameters. Because of this shrinkage, we cannot 
pinpoint all the schools where, say, the difference between categories 3 and 1 
has a negative sign. For several schools, the conditional means indicate a 
small difference among the categories; some of these may be negative, others 
positive and larger than the conditional expectation. Accordingly, we should 
downscale our notion of what is an exceptionally large deviation; for example, 
a 1.5 multiple of the standard deviation (sigma) should be regarded as 
exceptional . 

We conclude with an example of an exceptional school. All the 
random-effects components of school 22 (42 pupilr in the data) are positive. 
Its deviation from the average regression formula is 
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1.517 + .100 XROT + .102 YDESIRE + 1.008 YM 3 + .842 YM 4t 

where YM 3 (and YM 4 ) are equal to 1 if the pupil is in category 3 (4) and 0 
otherwise. This outcome indicates that school 22 is characterized by high 
performance, with the differences in initial ability tending to get 
exaggerated. That is, pupils with high motivation and high expectations are 
at an advantage. For sample mean values of XROT and YDESIRE, this formula 
becomes 

^2 . 959 + 1.008 YM 3 + .842 YM 4 , 

which reflects the high "performance" of the school much more clearly. The 
variances quoted above refer to the regression using centered versions of all 
the variables (XROT - XROT , YDESIRE - YDESIRE , YM 3 - YM 3 , YM 4 - YM 4 ) . In 
the transformation from one parametrization to the other, only the 
intercept- variance is affected. 



CHAPTER V: DISCUSSION 

At the outset of this paper, we posed a series of questions: 
(i) do schools affect student learning differentially? (ii) what part of this 
variation is attributable to between school characteristics versus between 
student characteristics? (iii) what characteristics of teachers and schools 
enhance student achievement, independent of student background? (iv) are 
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these effects uniform across students? (v) what is the comparative 
effectiveness of alternative inputs? and (vi) how do estimates obtained from 
simple OI«S methods compare with estimates obtained from multi- level methods? 
During the analysis, a sixth question arose: are there alternative regression 
models that predict student achievement equally well as the model developed 
herein? In this section, we review our findings and present some caveats 
about their interpretation* 

» 

Santa 

School effects . The first analysis in this paper examined the 
extent to which schools differed in their ability to transform pretest scores 
into posttest scores. We found that the schools in this sample from Thailand 
were equally effective in converting pretest into posttest scores and that 
there were essentially no variable slopes in this respect. That is, the 
results from the simple variance component model did not differ significantly 
from those obtained from the variance component model that included variable 
slopes . 

Contribution uf school versus individual characteristics . In our 
second analysis, we examined group and individual effects on total variance. 
Group-level effects contributed 32% of the variance, while individual-level 
effects contributed 68% of the variance in posttest scores, after controlling 
for the pretest scores* We were able to explain most of the group- level 
variation but were less successful in explaining individual variation. 
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Effective teacher and school characteristics . The results from 
our final analysis indicate that some teacher and school characteristics are 
positively associated with student learning in Thailand: 

o The percentage of teachers in the school that are qualified to 
teach mathematics 

o an enriched mathematics curriculum and 

o the frequent use of textbooks by teachers. 

At the same time, some teaching practices are negatively related to learning: 

o the frequent use of workbooks, and 

o time spent maintaining order in the classroom. 

The positive results are not surprising. Teachers who know the 
subject matter being taught, a curriculum that covers the domain, and 
textbooks that provide a structured presentation of the material all should 
have positive effects on achievement. The negative results are also 
unsurprising. Teachers who spend a great deal of time maintaining classroom 
order will have less time available for teaching; therefore, less learning 
takes place. Similarly, frequent use of workbooks may detract from effective 
teaching, answering questions and so forth. 

6R 
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Uniformity of effects . In this sample, we found that the schools 
did not have uniform effects on all students. In particular, the effects 
differed according to the level of students 1 expectations about further 
education. Somj schools/classrooras were more effective for students with low 
expectations, some were more effective for students with high expectations, 
while others were equally effective (or ineffective) for all types of 
students. Interestingly enough, we found little evidence that schools were 
differentially effective for students on the basis of gender, age, parental 
occupation or several other student attitudes. 

Comparative effectiveness of inputs . Overall, we found few school 
"inputs" tHkt were associated with differential achievement over time. 
Frequent use of textbooks increased achievement by a full point on the 
posttest, while use of workbooks decreased achievement by a third of a point; 
an enriched curriculum increased posttest scores by over 2.5 points. Each 
additional percentage point of teachers qualified to teach mathematics raised 
posttest scores by over 1 point. 

However, these causal statements do not hold if they are to be 
interpreted as the result of an external intervention. Obtaining (additional) 
textbooks for the schools is not a simple procedure unrelated to educational 
processes and management decisions; it is itself an outcome variable related 
to some (unknown) aspects of the educational process. Similarly, discarding 
workbooks might not lead to improved outcomes, unless all the circumstances 
that lead to reduced use of workbooks are also present or are induced 
externally. External intervention will be free of risk only if we have, and 
apply, causal models for how the educational system functions. The models 
developed in this paper, and elsewhere in the literature on educational 
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research, are purely descriptive. Use of regression methods and of variance 
component analysis allows improved description but does not provide inferences 
about causal relationships. 

In addition, interpretations of the estimates of effects are subject 
to a variety of influences, and there may be alternative regression models, 
with different variables, that are equally correct in terms of prediction. 
Thus, the selection of variables included in this model is responsible, to 
some degree, for the results, and a different selection of variables could 
yield substantially different results with respect to the contribution of each 
variable . 

Comparison with OLS . The analysis demonstrates that estimates based 
on OLS regressions do yield different results, in some cases, from those based 
on VC regressions. For example, in comparing the OLS estimates with the VCS 
estimates in Figure 6, we see that for tin th sub the coefficients are quite 
different. Based on OLS, we would conclude that students in "enriched" 
classes, with the other explanatory variables controlled for, perform about 2 
points (13%) higher than those in "normal" or "remedial" classes; the 
conclusion based on the VC regression is that they perform nearly 2.6 points 
(17%) higher. Combining these effects with cost information permits an 
estimation of cost- effectiveness. If enriched classes cost 13% more than 
remedial or normal classes, we would conclude that they were either equally 
cost-effective (OLS) or more cost-effective (VC) than are remedial/normal 
classes, depending on the model. Similarly, if enriched classes cost 17% more 
than remedial/normal classes, they would be either equally cost-effective (VC) 
or less cost-effective (OLS), depending on the model. 
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However, the caution in the previous subsection about causal 
inference applies equally in this context. Classes, or schools, cannot be 
declared to have enriched curricula at an external will and by supplying the 
outward signs of having enriched curriculum; rather, a whole complex of 
related circumstances has to be arranged, e.g., strengthened education in 
lower grades, synchronization with other subjects, etc. Since we argued 
earlier in the paper that estimates based on VC methods are preferable to 
those based on OLS methods, differences of these types could hold important 
policy implications for schools deciding on the type of curriculum to choose. 

Caveats ^ 

We have noted that alternative models can yield similar predictions 
(in terras of achievement) but might include a different set of variables. 
That such could be the case is not a problem limited to VC models; Lt is a 
perennial problem with these general types of analyses. In our analysis, we 
included a number of individual pupil and school/classroom variables; in this 
respect, we moved well beyond earlier models, which included only modest 
"intake" characteristics of students. Identifying the variables associated 
with higher outcome scores does not, however, offer a direct answer to the 
principal question of a development agency about the distribution of its 
resources to a set, or continuum, of intervention policies in an educational 
system. Without any prior knowledge of the educational system, any 
justification for an intervention policy based on the results of regression 
(or variance component) analysis, or even of structural modelling (LISREL) , 
has no proper foundation, Certain intervention policies may cause a change in 
the educational system, and hence a change in the regression model itself. 
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This new regression model may indicate that the selected intervention is far 
from optimal or may even be detrimental. 

A case in point is the pretest score XROT. Its coefficient is 
positive and of substantial magnitude. A conceivable intervention policy to 
raise the XROT scores would be, for example, to provide coaching prior to 
administering the pretest. Clearly such an intervention, if effective, could 
lead to a change in the regression formula. Alternatively, if coaching took 
place between the pretest and posttest, the regression formula would again be 
changed, but differently. Any number of different scenarios is easy to 
construct, in which the coefficient on XROT would be close to 1 or 
substantially lower than .62 (the level obtained in our analysis). 

Similarly, indiscriminant reduction of the time spent maintaining 
order in the classroom, probably a less expensive intervention in monetary 
terms, is likely to be an unreasonable solution. Introduction of the enriched 
mathematics curriculum for all students is most likely not practical, and even 
its extension to a few more classrooms may place excessive requirements on 
staff in the schools that would lower the quality of instruction in other 
subjects and/or other grades. 

In conclusion, posiwive or negative regression coefficients cannot 
be regarded uncritically as indicators of cause and effect, or influence. An 
intervention should be regarded as an experiment, whose outcome can be 
predicted from an observational study only under the unrealistic assumptions 
of the regression formula describing accurately the mechanics of a rigid 
educational process . 
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This finding does not mean that absolutely no inferences can be made 
without a carefully designed experiment. It means that the results of the 
statistical analysis based violated assumptions of randomization should be 
supplemented with external information about the complex selection processes 
and other sources of bias. This adjustmenc does not submit to a rigorous 
treatment, and therefore we can only speculate how different our results would 
have been had we carried out a (hypothetical) experiment instead of a survey. 

Three important items of information would assist in answering the 
question about the allocation of resources: 

(i) What are the feasibility and cost of various interventions 

(ii) How an intervention will affect other explanatory variables 
and which aspects of the educational process will remain 
unaltered after the intervention 

(iii) How directly manipulable the "interventions" are. 

It is critical to distinguish between the variables that are 
manifest (unchangeable, e.g., pupil background), that are manipulable (e.g., 
time spent on a task of a particular kind) and that are manipulable only by 
direct intervention. For example, uhe time spent maintaining discipline is a 
manipulable variable, but it can be manipulated either indirectly (e.g., by 
making the curriculum more interesting or by providing more suitable or more 
interesting textbooks) or directly (by changing teacher behavior so as to 
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ignore disruptive student behavior) . Considerations as to effective education 
policy require attention to direstly manipulable variables. In the present 
analysis, these are the qualifications of the mathematics teachers and the use 
of textbooks . 
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